IN THE CLAIMS 



1 . (Currently Amended) A method comprising: 

responsive to receiving a single packed shuffle instruction designating, with 3 

bits, a first register storing a first operand having a set of L data elements 
and designating, with 3 bits, a second register storing a second operand 
having a set of L control elements, wherein the first operand and second 
operand are of a same size and each of the L data elements and L control 
elements are of a same size, and wherein each one of the L control 
elements is divided into three portions, the first portion being a flush to 
zero bit occupying the most significant bit of each control element 
wherein the flush to zero bit alone controls whether a resultant element is 
flushed to zero , the second portion being a position selection field that is at 
least log 2 L bits wide and indicates a position of one of said L data 
elements, and a third portion, storing a resultant operand in said first 
register having L resultant data elements of the same size as the L data 
elements and the L control elements, wherein the value of each resultant 
data element is controlled by the position selection field of the L control 
elements in the same position as the resultant data element, and is either, 
the one of the L data elements designated by the position selection field of 

said control element if said control element's flush to zero bit is 

not set; or 

a zero if said control element's flush to zero bit is set. 
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2. (Cancelled) 



3. (Cancelled) 

4. (Previously Presented) The method of claim 1 wherein said control 
element is to designate a first operand data element by a data element position number. 

5. (Cancelled) 

6. (Cancelled) 

7. (Previously Presented) The method of claim 1 further comprising 
outputting a resultant data block comprising data that was shuffled from said first 
operand in response to said control elements of said second operand. 

8. (Original) The method of claim 1 wherein each of said data elements 
comprises a byte of data. 

9. (Original) The method of claim 8 wherein each of said control elements is 
a byte wide. 

10. (Original) The method of claim 9 wherein L is 8 and wherein said first 
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operand, said second operand, and said resultant are each comprised of 64-bit wide 
packed data. 

1 1 . (Original) The method of claim 9 wherein L is 16 and wherein said first 
operand, said second operand, and said resultant are each comprised of 128-bit wide 
packed data. 

12. (Currently Amended) An apparatus comprising: 

an execution unit to execute a single packed shuffle instruction designating, with 
3 bits, a first register storing a first operand comprised of a set of L data 
elements and designating, with 3 bits, a second register storing a second 
operand comprised of a set of L control elements, wherein the first 
operand and second operand are of a same size and each of the L data 
elements and L control elements are of a same size, and wherein each one 
of the L control elements is divided into three portions, the first portion 
being a flush to zero bit occupying the most significant bit of each control 
element wherein the flush to zero bit alone controls whether a resultant 
element is flushed to zero , the second portion being a position selection 
field that is at least log2L bits wide and indicates a position of one of said 
L data elements, and a third portion, said shuffle instruction to cause said 
execution unit to store a resultant operand in said first register having L 
resultant data elements of the same size as the L data elements and the L 
control elements, wherein the value of each resultant data element is 
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controlled by the position selection field of the L control elements in the 
same position as the resultant data element, and is either, 
a zero if said control element's flush to zero bit is true, otherwise 
the one of the L data elements designated by the position selection field of 
said individual control element. 

13. (Original) The apparatus of claim 12 wherein each of said L control 
elements occupies a position in said second operand and is associated with a similarly 
located data element position in a resultant. 

14. (Original) The apparatus of claim 13 wherein each individual control 
element is to designate a first operand data element by a data element position number. 

15. (Cancelled) 

16. (Cancelled) 

17. (Previously Presented) The apparatus of claim 12 wherein said shuffle 
instruction is to further cause said execution unit to generate a resultant having L data 
element positions that have been filled based on said set of L control elements. 

18. (Original) The apparatus of claim 12 wherein each of said data elements 
comprises a byte of data and each of said control elements is a byte wide. 
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19. (Original) The apparatus of claim 18 wherein L is 8 wherein said first 
operand, said second operand, and said resultant are each comprised of 64-bit wide 
packed data. 

20. (Original) The apparatus of claim 18 wherein L is 16 and wherein said 
first operand, said second operand, and said resultant are each comprised of 128-bit wide 
packed data. 

2 1 . (Currently Amended) An article of manufacture comprising a machine 
readable storage medium that stores data, that when accessed by a machine, causes the 
machine to perform operations comprising: 

responsive to receiving a single packed shuffle instruction designating, with 3 

bits, a first register storing a first operand having a set of L data elements 
and designating, with 3 bits, a second register storing a second operand 
having a set of L control elements, wherein the first operand and second 
operand are of a same size and each of the L data elements and L control 
elements are of a same size, and wherein each one of the L control 
elements is divided into three portions, the first portion being a flush to 
zero bit occupying the most significant bit of each control element 
wherein the flush to zero bit alone controls whether a resultant element is 
flushed to zero, the second portion being a position selection field that is at 
least log 2 L bits wide and indicates a position of one of said L data 
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elements, and a third portion, storing a resultant operand in said first 
register having L resultant data elements of the same size as the L data 
elements and the L control elements, wherein the value of each resultant 
data element is controlled by the position selection field of the L control 
elements in the same position as the resultant data element, and is either, 
the one of the L data elements designated by the position selection field of 

said control element if said control element's flush to zero bit is 

not set; or 

a zero if said control element's flush to zero bit is set. 

22. (Previously Presented) The article of manufacture of claim 21 wherein 
said data stored by said machine readable storage medium represents an integrated circuit 
design, which when fabricated performs said predetermined function in response to a 
single instruction. 

23. (Previously Presented) The article of manufacture of claim 22 wherein 
said machine readable storage medium further includes data, that causes the machine to 
perform operations further comprising: 

generating a resultant having L data element positions that been filled in 
accordance to said set of L control elements. 

24. (Currently Amended) The article of manufacture comprising the machine 
readable storage medium of claim 23 wherein each of said L control elements is 
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associated with a similarly located data element position in a resultant. 



25. (Currently Amended) The article of manufacture comprising the machine 
readable storage medium of claim 24 wherein each individual control element is to 
designate a first operand data element by a data element position number. 

26. (Currently Amended) The article of manufacture comprising the machine 
readable storage medium of claim 25 wherein each of said data elements comprises a 
byte of data. 

27. (Cancelled) 

28. (Cancelled) 

29. (Previously Presented) The article of manufacture of claim 21 wherein 
said data stored by said machine readable storage medium represents a computer 
instruction, which, if executed by a machine, causes said machine to perform said 
predetermined function. 

30. (Currently Amended) A method comprising: 

responsive to receiving a single packed shuffle instruction designating, with 3 

bits, a first register storing a first operand having a set of L data elements 
and designating, with 3 bits, a second register storing a second operand 
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having a set of L masks, wherein the first operand and second operand are 
of a same size and each of the L data elements and L masks are of a same 
size, and wherein each one of the L masks is divided into three portions, 
the first portion being a flush to zero bit occupying the most significant bit 
of each control element wherein the flush to zero bit alone controls 
whether a resultant element is flushed to zero , the second portion being a 
position selection field that is at least log2L bits wide and indicates a 
position of one of said L data elements, and a third portion, and wherein 
each of said L masks occupies a particular position in said second operand 
and is associated with a similarly located data element position in a 
resultant operand, storing the resultant operand in said first register having 
L resultant data elements of the same size as the L data elements and the L 
masks, wherein the value of each resultant data element is controlled by 
the position selection field of the L masks in the same position as the 
resultant data element, and is either, 
a zero if said mask's flush to zero bit is set; or 

if said mask's flush to zero bit is not set, the one of the L data elements 
designated by the position selection field of said mask to said 
associated resultant data element position. 

31.-33. (Cancelled) 

34. (Previously Presented) The method of claim 30 wherein said first operand, 
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said second operand, and said resultant are each comprised of 64-bit wide packed data. 



35. (Previously Presented) The method of claim 30 wherein said first operand, 
said second operand, and said resultant are each comprised of 128-bit wide packed data. 

36. (Currently Amended) A method comprising: 

responsive to receiving a single packed shuffle instruction designating, with 3 

bits, a first register storing a first operand having a set of L data elements 
and designating, with 3 bits, a second register storing a second operand 
having a set of L shuffle masks, wherein the first operand and second 
operand are of a same size and each of the L data elements and L masks 
are of a same size, and wherein each one of the L shuffle masks is divided 
into three portions, the first portion being a flush to zero bit occupying the 
most significant bit of each control element wherein the flush to zero bit 
alone controls whether a resultant element is flushed to zero , the second 
portion being a position selection field that is at least log2L bits wide and 
indicates a position of one of said L data elements, and a third portion, and 
wherein each of said L shuffle masks is associated with a similarly located 
data element position in a resultant operand, storing the resultant operand 
in said first register having L resultant data elements of the same size as 
the L data elements and the L masks, wherein the value of each resultant 
data element is controlled by the position selection field of the L 
individual masks in the same position as the resultant data element, and is 
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either, 

a zero if said mask's flush to zero bit is set, otherwise 

the one of the L data elements designated by the position selection field of 

said individual shuffle mask to said associated resultant data 

element position. 

37. (Cancelled) 

38. (Cancelled) 

39. (Currently Amended) An apparatus comprising: 

a first memory location to store a plurality of source data elements; 

a second memory location to store a plurality of control elements, each of said 
control elements to correspond to a resultant data element position, and 
wherein each one of said control elements is divided into three portions, 
the first portion being a flush to zero bit occupying the most significant bit 
of each control element wherein the flush to zero bit alone controls 
whether a resultant element is flushed to zero , the second portion being a 
position selection field that is at least log 2 L bits wide and indicates a 
position of one of said L data elements, and a third portion; 

control logic coupled to said first memory location and said second memory 
location, said control logic in response the receipt of a single packed 
shuffle instruction designating, with three bits, a first memory location 
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storing a first operand having a set of L data elements and designating a 
second memory location storing a second operand having a set of L 
control elements, wherein the first operand and the second operand are of 
a same size and each of the L data elements arid L control elements are of 
a same size, to generate a plurality of selection signals and a plurality of 
flush to zero signals, a zero signal generated when a control element's 
flush to zero bit is set; 

a first plurality of multiplexers coupled to said first memory location and said 

plurality of selection signals, each of said first plurality of multiplexers to 
store a resultant operand in said first memory location having L resultant 
data elements of the same size as the L data elements and the L control 
elements, wherein the value of each resultant data element is controlled by 
the position selection signal of the L control elements in the same position 
as the resultant data element, and is the one of the L data elements for a 
specific resultant data element position in response to a selection signal 
corresponding to said specific resultant data element position; and 

a second plurality of multiplexers coupled to said first plurality of multiplexers 
and to said plurality of flush to zero signals, each of said second plurality 
of multiplexers associated with a specific resultant data element position, 
each of said second plurality of multiplexers to output a zero if its flush to 
zero signal is active or to output a data element shuffled for that specific 
resultant data element position. 
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40. (Original) The apparatus of claim 39 wherein said plurality of source data 
elements is a first packed data operand. 

41 . (Original) The apparatus of claim 40 where said plurality of control 
elements is a second packed data operand. 

42. (Original) The apparatus of claim 40 wherein said first and second 
memory locations are a single instruction multiple data registers. 

43. (Original) The apparatus of claim 42 wherein: 

said first packed operand is 64 bits long and each of said source data 

elements is a byte wide; and 
said second packed operand is 64 bits long and each of said control 
elements is a byte wide. 

44. (Original) The apparatus of claim 42 wherein: 

said first packed operand is 128 bits long and each of said source data 

elements is a byte wide; and 
said second packed operand is 128 bits long and each of said control 

elements is a byte wide. 

45. (Currently Amended) An apparatus comprising: 

control logic to receive a single packed shuffle instruction designating, with three 
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bits, a first memory location storing a first operand having a set of M data 
elements and designating, with three bits, a second memory location 
storing a second operand having a set of L shuffle masks, wherein each of 
the M data elements and L shuffle masks are of a same size, and wherein 
each one of the L shuffle masks is divided into three portions, the first 
portion being a flush to zero bit occupying the most significant bit of each 
shuffle mask wherein the flush to zero bit alone controls whether a 
resultant element is flushed to zero , the second portion being a position 
selection field that is at least log2L bits wide, and a third portion, and 
wherein each shuffle mask is associated with a unique resultant data 
element position controlled by the position selection field of said shuffle 
mask, said control logic to provide a select signal and a flush to zero 
signal for each resultant data element position; 
a set of L multiplexers coupled to said control logic, wherein each multiplexer is 
also associated with a unique resultant data element position, each 
multiplexer to output to said first memory location either, 
a zero if said shuffle mask's flush to zero signal is active or the one of the 
M data elements designated by the select signal of said shuffle 
mask if said shuffle mask's flush to zero signal is not inactive. 

46. (Original) The apparatus of claim 45 further comprising a register with L 
unique data element positions, each data element position to hold an output from its 
associated multiplexer. 
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47. (Original) The apparatus of claim 46 wherein L is 16 and M is 16. 

48. (Currently Amended) A system comprising: 
a memory to store data and instructions; 

a processor coupled to said memory on a bus, said processor operable to perform 
a shuffle operation, said processor comprising: 
a bus unit to receive a single packed shuffle instruction, from said 

memory, said instruction to designate, with 3 bits, a first register 
storing L data elements from a first operand, and to designate, with 
three bits, L shuffle control elements from a second operand, 
wherein the first operand and second operand are of a same size 
and each of thee L data elements and L control elements are of a 
same size, and wherein each one o the L control elements is 
divided into three portions, the first portion being a flush to zero 
bit occupying the most significant bit of each control element 
wherein the flush to zero bit alone controls whether a resultant 
element is flushed to zero , the second portion being a position 
selection field that is at least log2L bits wide and indicates a 
position of one of the L data elements, and a third portion; 
an execution unit coupled to said bus unit, said execution unit to execute said 

single packed shuffle instruction, said single packed shuffle instruction to 
cause said execution unit to: 
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store a resultant operand in said first register having L resultant data 

elements of the same size as the L data elements and the L control 
elements, wherein the value of each resultant data element is 
controlled by the position selection field of the L control elements 
in the same position as the resultant data element, and is either, 
the one of the L data elements designated by the position selection 
field of said control element if said control element's flush to zero 
bit is not set; or 

a zero if said control element's flush to zero bit is set. 



49.-51. (Cancelled) 



52. (Original) The system of claim 48 wherein each data element is a byte 
wide, each shuffle command element is a byte wide, and L is 8. 



53. (Original) The system of claim 48 wherein said first operand is 64 bits 
long and said second operand is 64 bits long. 
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